Algoritma is a data science education center based in Jakarta. We organize workshops and training programs to help working professionals and students gain mastery in various data science sub-fields: data visualization, machine learning, data modeling, statistical inference etc. Visit our website for all upcoming workshops.
We’ll set-up caching for this notebook given how computationally expensive some of the code we will write can get.
options(width=50)
knitr::opts_chunk$set(cache = F,tidy=TRUE, fig.align = "center")
options(scipen = 9999)
rm(list=ls())You will need to use install.packages() to install any packages that are not already downloaded onto your machine. You then load the package into your workspace using the library() function:
library(dplyr)
library(tidyr)
library(ggplot2)
library(plotly)
library(reshape2)
library(leaflet)After having learned and explored appropriate techniques on visualizing data, students are required to deploy an interactive dashboard web application using a shiny server which contain any plotting objects such as ggplot and/or leaflet that display useful insights.
Before you making the dashboard, let’s answer this question first to help you creating a dashboard with useful insight.
What is the dashboard about?
This question is self explanatory, you should know what is it about, what problem you try to solve with this dashboard, what story you try to tell to your audience.
Who is the user of your dashboard?
Knowing the user of your dashboard is very important. What division or what kind of people using this dashboard. Do you need a detail or more practical dashboard? When your user is on operational level you need a detailed dashboard but when your user is on managerial level you need simple and general dashboard that can convey the insight quickly.
Why you choose that data?
How much your understanding of that data, Is that data can solve your question? Why do you choose that variable, are they really corelated? Important to know why your choose that data so you don’t create a misleading insight which very dangerous.
When is the data collected?
Is it still relevant? For example You can’t use the data from 80s to describe how’s the traffic at current date. Since the trend is everchanging so does the answer to your question, can those very old data answer your question? Irrelevant data can create a misleading insight.
Where you put your plot, valuebox, or input etc?
Make a simple layout design, so you have a image how your end product will look like. Is it tidy enough? Easy enough for your user to understand it? Always follow 5 seconds rule. Your dashboard should provide the relevant information in about 5 seconds.
How your dashboard answer your question, hypothesis, or problem you try to solve?
Are you using a right plot? A right variable? Always start from your problem, make sure you use a right plot for right problem. For example what plot you use for see your data distribution? Are you using density plot or line plot?
In addition, students are given the freedom to use their own dataset or past datasets from previous classes. Below are the rubrics for assessment and grading, Students will get the point(s) if they :
The dashboard should contain at least 2 different input types. For example, there is a slider input and select input on the dashboard. If the dashboard only has 2 select inputs or 2 similar inputs, it will be counted as one.
All input should have an appropriate user interface. For example, a date should not use a select input but instead use the proper date input or date range input.
The dashboard should have input widgets that would give the user the ability to explore the data. Some useful input including filtering data with slider input or selecting different categories with select input. Less useful input including changing the color of the plot.
The dashboard should contain at least 3 different tabs/page that convey different information.
All plots should be presented as interactive plots. You can either use plotly, highcharter, echarts4r, or other interactive plot packages in R.
The dashboard should contain at least 2 different plot types. We expect you can explore different visualizations to convey different information. For example, a dashboard contains 2 different plots: a bar chart and a line plot. The number of plots itself is not limited.
All information should be presented with appropriate plots. For example, if you want to show categorical ranking, you can use bar chart or lollipop chart. You can refer to data-to-viz for guidance.
The plot should be able to react to the change given by input.
All plots should have clear information and are easy to understand. There should be at least a plot title and clear axis title. You can refer to this notes regarding this problem.
The dashboard should be deployed to shinyapps.io and contain no error after being deployed.
We don’t expect you to be a great UI designer. However, your dashboard page should be tidy and clean enough to watch. Some considerations to help you create a tidy page including:
Some considerations to help you create a tidy plot including:
The plot should have a customized tooltip that has a clear and easy to read popup text.
Bad Tooltip
Bad tooltip typically use the original column name that sometimes are hard to read. The number is still in raw value, for example the GDP per Capita for Gabon is 13206.4845. The number will be easier to read if presented as 13,206.4845 with comma separator for every 3 digits.
gdpPercap: 13206.4845
lifeExp: 56.735
Good Tooltip
Gabon (name of the country)
GDP per Capita: 13,206.48
Life Expectancy: 56.735
Color must be carefully chosen to serve a purpose and it must be clear and do not distract the reader. One of the most common pitfalls is using color for bar charts when a large number of categories are present. You can read more about issues of color usage in this book’s chapter.
If you achieved all those criteria you will get total 30 points.
Data Visualization Track Rubics
You are expected to submit the link from your deployed shiny to the google classroom via private comment. We will evaluate your dashboard and give feedback based on the submitted link.
NOTES
If you use confidential data, you still need to deploy your dashboard and take a screenshot of your deployed dashboard. After that, you can take down the deployed dashboard on shinyapps.io and submit your capstone project in zip.
Shiny Dashboard https://rstudio.github.io/shinydashboard/
Shiny Rstudio https://shiny.rstudio.com/gallery
tychobra https://www.tychobra.com/shiny/
Algotech: Articles tagged Shiny https://algotech.netlify.app/tags/shiny/
Demo from Algoritma :
Advisory - Samuel Chan https://github.com/onlyphantom/advisory
Workforce - Tiara https://github.com/tiaradwiputri/workforce
Ecological Footprint - Nabiilah: https://github.com/NabiilahArdini/Eco-Status
Some data source reference:
options(shiny.maxRequestSize=200*1024^2)
saveRDS()
readRDS()
You can start to work and plan your project after the briefing end. We expect you can finish answering the first objective today so you can focus on building the shiny dashboard for the rest of the week.
Below is our estimated time to finish answering the first objective (5W+1H questions) during the briefing day.
Below is our example of answering the 5W+1H questions above.
I want to show how every country around the world manages its natural resources, shown by the value of Ecological Footprint and Biocapacity of these countries.
A country experiencing Ecological Deficit is indicated by the behavior of that country that imports Biocapacity through trade, liquidation of national ecological assets or emits a lot of carbon dioxide emissions into the air. Meanwhile, countries are said to have Ecological Reserves when Biocapacity (how many natural resources are owned) exceeds Ecological Footprint (how many natural resources are used). Thus, countries that have Ecological Footprint bigger than Biocapacity have the potential to suffer from various ecological impacts such as natural disasters, land damage, loss of biodiversity, and other things that can have negative impacts on the environment and the country’s economy.
For this project, I specifically want to:
This dashboard is created as a medium of education for common people regarding usage and preservation of countries natural resources.
The dataset that is suitable for this project is the ecological data acquired from Global Footprint Network (http://data.footprintnetwork.org/). The data was updated on August 12, 2019 so it is still relevant with the current condition.
footprint <- read.csv("data_input/countries.csv")
footprint <- footprint %>%
mutate(Country = as.character(Country), GDP.per.Capita = as.numeric(gsub("[$,]",
"", footprint$GDP.per.Capita)), HDI = round(HDI,
2), Countries.Required = round(Countries.Required,
2), Biocapacity.Deficit = as.factor(ifelse(Biocapacity.Deficit >
0, "Reserve", "Deficit"))) %>%
rename(Status = Biocapacity.Deficit) %>%
select(-c(Data.Quality)) %>%
drop_na()
footprintExplain how to achieve each goals or purposes stated on What question.
I will create a plot that shows the Ecological Footprint and Biocapacity for each region.
ef_region <- footprint %>%
group_by(Region) %>%
summarize(Ecological.Footprint = sum(Total.Ecological.Footprint)) %>%
arrange(desc(Ecological.Footprint)) %>%
mutate(text = paste0("Ecological Footprint: ",
Ecological.Footprint, " gha"))
ef_reg_plot <- ggplot(ef_region, aes(x = reorder(Region,
Ecological.Footprint), y = Ecological.Footprint,
text = text)) + geom_col(aes(fill = Ecological.Footprint),
show.legend = F) + coord_flip() + labs(title = "Ecological Footprint by Region",
y = "global hectares (gha)", x = NULL) + scale_y_continuous(limits = c(0,
150), breaks = seq(0, 150, 25)) + scale_fill_gradient(low = "#F78181",
high = "#3B0B0B") + theme(plot.title = element_text(face = "bold",
size = 14, hjust = 0.04), axis.ticks.y = element_blank(),
panel.background = element_rect(fill = "#ffffff"),
panel.grid.major.x = element_line(colour = "grey"),
axis.line.x = element_line(color = "grey"), axis.text = element_text(size = 10,
colour = "black"))
ggplotly(ef_reg_plot, tooltip = "text")b_region <- footprint %>%
group_by(Region) %>%
summarize(Biocapacity = sum(Total.Biocapacity)) %>%
arrange(desc(Biocapacity)) %>%
mutate(text = paste0("Biocapacity: ", Biocapacity,
" gha"))
b_reg_plot <- ggplot(b_region, aes(x = reorder(Region,
Biocapacity), y = Biocapacity, text = text)) +
geom_col(aes(fill = Biocapacity), show.legend = F) +
coord_flip() + labs(title = "Biocapacity by Region",
y = "global hectares (gha)", x = NULL) + scale_y_continuous(limits = c(0,
275), breaks = seq(0, 250, 50)) + scale_fill_gradient(low = "#9AFE2E",
high = "#0B6121") + theme(plot.title = element_text(face = "bold",
size = 14, hjust = 0.04), axis.ticks.y = element_blank(),
panel.background = element_rect(fill = "#ffffff"),
panel.grid.major.x = element_line(colour = "grey"),
axis.line.x = element_line(color = "grey"), axis.text = element_text(size = 10,
colour = "black"))
ggplotly(b_reg_plot, tooltip = "text")Create a scatterplot between human development index dengan ecological footprint
scat_plot_data <- footprint %>%
select(Country, Population.millions, GDP.per.Capita,
HDI, Total.Ecological.Footprint, Status) %>%
rename(Population.in.millions = Population.millions,
Human.Development.Index = HDI, Ecological.Footprint = Total.Ecological.Footprint) %>%
mutate(text = paste0("Country: ", Country, "<br>",
"HDI: ", Human.Development.Index, "<br>", "Ecological Footprint: ",
Ecological.Footprint, "<br>", "GDP per Capita: ",
"$", GDP.per.Capita))
scat_plot <- ggplot(scat_plot_data, aes(x = Human.Development.Index,
y = Ecological.Footprint, text = text)) + geom_smooth(col = "#61380B",
size = 0.7) + geom_point(aes(color = Status, size = GDP.per.Capita)) +
scale_y_continuous(limits = c(0, 18)) + scale_color_manual(values = c("#DF0101",
"#04B486")) + labs(title = "HDI on Ecological Footprint",
y = "Ecological Footprint", x = "Human Development Index") +
theme(plot.title = element_text(face = "bold",
size = 14, hjust = 0), panel.background = element_rect(fill = "#ffffff"),
panel.grid.major.x = element_line(colour = "grey"),
panel.grid.major.y = element_line(colour = "grey"),
axis.line.x = element_line(color = "grey"),
axis.line.y = element_line(color = "grey"),
axis.text = element_text(size = 10, colour = "black"),
legend.title = element_blank())
ggplotly(scat_plot, tooltip = "text") %>%
layout(legend = list(orientation = "v", y = 1,
x = 0))## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
Create a leaflet map with relevant information for the popup
leaflet <- footprint %>%
dplyr::select(-c(2, 6:10, 12:16, 19:20))
shape <- raster::shapefile("data_input/TM_WORLD_BORDERS_SIMPL-0.3.shp")
# prepare data for color
leaf <- leaflet %>%
mutate(diff = Total.Biocapacity - Total.Ecological.Footprint) %>%
select(-Population.millions) %>%
rename(NAME = Country)
# combining data
shape@data <- shape@data %>%
left_join(leaf, by = "NAME")
# cleaning data
shape@data[shape@data$NAME == "United States", c(12:17)] <- leaf[leaf$NAME ==
"United States of America", c(2:7)]
shape@data[shape@data$NAME == "Russia", c(12:17)] <- leaf[leaf$NAME ==
"Russian Federation", c(2:7)]
shape@data[shape@data$NAME == "Venezuela", c(12:17)] <- leaf[leaf$NAME ==
"Venezuela, Bolivarian Republic of", c(2:7)]
shape@data[shape@data$NAME == "Republic of Moldova",
c(12:17)] <- leaf[leaf$NAME == "Moldova", c(2:7)]
shape@data[shape@data$NAME == "The former Yugoslav Republic of Macedonia",
c(12:17)] <- leaf[leaf$NAME == "Macedonia TFYR",
c(2:7)]
shape@data[shape@data$NAME == "Iran (Islamic Republic of)",
c(12:17)] <- leaf[leaf$NAME == "Iran, Islamic Republic of",
c(2:7)]
shape@data[shape@data$NAME == "Democratic Republic of the Congo",
c(12:17)] <- leaf[leaf$NAME == "Congo, Democratic Republic of",
c(2:7)]
shape@data[shape@data$NAME == "United Republic of Tanzania",
c(12:17)] <- leaf[leaf$NAME == "Tanzania, United Republic of",
c(2:7)]
shape@data[shape@data$NAME == "Burma", c(12:17)] <- leaf[leaf$NAME ==
"Myanmar", c(2:7)]
# Create a color palette with handmade bins.
library(RColorBrewer)
mybins <- c(-Inf, -5, 0, 5, Inf)
mypalette <- colorBin(palette = "RdYlGn", domain = shape@data$diff,
na.color = "transparent", bins = mybins)
# prepare label
mytext <- paste(shape@data$NAME) %>%
lapply(htmltools::HTML)
popup_shape <- paste("<h3><b>", shape@data$NAME, "</b></h3>",
"Status: ", shape@data$Status, "<br>", "Ecological Footprint: ",
shape@data$Total.Ecological.Footprint, " gha <br>",
"Biocapacity: ", shape@data$Total.Biocapacity,
" gha <br>", "HDI: ", shape@data$HDI, "<br>", "GDP per Capita: ",
"$", shape@data$GDP.per.Capita, "<br>", sep = "")
m <- leaflet(shape) %>%
addProviderTiles("Esri.NatGeoWorldMap") %>%
setView(lat = 10, lng = 0, zoom = 2) %>%
addPolygons(fillColor = ~mypalette(diff), color = "green",
dashArray = "3", fillOpacity = 0.6, weight = 1,
label = mytext, labelOptions = labelOptions(style = list(`font-weight` = "normal",
padding = "3px 8px"), textsize = "13px",
direction = "auto"), popup = popup_shape) %>%
addLegend(pal = mypalette, values = ~diff, opacity = 0.9,
title = paste("Remaining", "<br>", "Biocapacity (gha)"),
position = "bottomleft")
mDetermine the skecth or design for the layout of the dashboard.
On this section, the plot and information for each tab/page will be listed. The final shiny dashboard of the following example can be seen at https://nabiilahardini.shinyapps.io/Eco-Status/
Menu 1:
Menu 2:
Menu 3:
The first row shows the bar chart of ecological footprint and biocapacity of each region
The second row shows the scatter plot between Human Development Index (HDI) and the Ecological Footprint
The first row shows the leaflet map with popup that shows relevant information about the economic and ecological status.
The second row shows the proportion between biocapacity and ecological footprint between countries on the same region
The third row shows bar chart of each natural resource of a country and info box regarding their ecological status
The third tab shows the raw data.